On Indexing High Demensional Data with Uncertainty

نویسندگان

  • Charu C. Aggarwal
  • Philip S. Yu
چکیده

In this paper, we will examine the problem of distance function computation and indexing uncertain data in high dimensionality for nearest neighbor and range queries. Because of the inherent noise in uncertain data, traditional distance function measures such as the Lqmetric and their probabilistic variants are not qualitatively effective. This problem is further magnified by the sparsity issue in high dimensionality. In this paper, we examine methods of computing distance functions for high dimensional data which are qualitatively effective and friendly to the use of indexes. In this paper, we show how to construct an effective index structure in order to handle uncertain similarity and range queries in high dimensionality. Typical range queries in high dimensional space use only a subset of the ranges in order to resolve the queries. Furthermore, it is often desirable to run similarity queries with only a subset of the large number of dimensions. Such queries are difficult to resolve with traditional index structures which use the entire set of dimensions. We propose query-processing techniques which use effective search methods on the index in order to compute the final results. We discuss the experimental results on a number of real and synthetic data sets in terms of effectiveness and efficiency. We show that the proposed distance measures are not only more effective than traditional Lq-norms, but can also be computed more efficiently over our proposed index structure.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

یک روش مبتنی بر خوشه‌بندی سلسله‌مراتبی تقسیم‌کننده جهت شاخص‌گذاری اطلاعات تصویری

It is conventional to use multi-dimensional indexing structures to accelerate search operations in content-based image retrieval systems. Many efforts have been done in order to develop multi-dimensional indexing structures so far. In most practical applications of image retrieval, high-dimensional feature vectors are required, but current multi-dimensional indexing structures lose their effici...

متن کامل

میزان انطباق الزامات ساختاری مجلات علوم پزشکی کشور ایران با معیارهای نمایه‌سازی اسکوپوس

Background and Aim: In the recent years the number of science research health journals has increased in Iran. These journals should be based on the standards and criteria required in international indexing database. The aim of this study was to determine the adaptation rate of structural requirements on the Iranian medical journals with the criteria of indexing based on Scopus indexing database...

متن کامل

Threshold Interval Indexing for Complicated Uncertain Data

Uncertain data is an increasingly prevalent topic in database research, given the advance of instruments which inherently generate uncertainty in their data. In particular, the problem of indexing uncertain data for range queries has received considerable attention. To efficiently process range queries, existing approaches mainly focus on reducing the number of disk I/Os. However, due to the in...

متن کامل

Apply Uncertainty in Document-Oriented Database (MongoDB) Using F-XML

As moving to big data world where data is increasing in unstructured way with high velocity, there is a need of data-store to store this bundle amount of data. Traditionally, relational databases are used which are now not compatible to handle this large amount of data, so it is needed to move on to non-relational data-stores. In the current study, we have proposed an extension of the Mongo...

متن کامل

بررسی به کارگیری قوانین فهرست نویسی و بایگانی کارت اندکس بیماران بیمارستان های آموزشی دانشگاه علوم پزشکی مازندران، 1383

Background and purpose : The master patient’s index (MPÏ) card is the key to locate the patient’s record in medical records department. Üse of MPÏ in hospital information systems is important. Ân accurate MPÏ is noted in evaluation and accreditation program. Ôur study was done on MPÏ at medical records depatment of teaching hospitals in Mazandaran medical university in respect of using indexi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008